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SPECIFICATION 



BI\AR\ SEARCH TREES AND METHODS FOR ESTABLISHING AND 
OPERATING THEM 



Field of the Invention 

The piesent lmention relates generalh to binary search trees and methods of operating 
them, particular!) although not exclusiveh to enable look-ups m packet-based 
communicaton networks, so as for example to provide forwarding information (such as a 
port number of a switch) for a packet of which the address is stored in a lookup 

Background to Invention 

One of the biggest bottlenecks in a switch design is the lookup process A process called 
hashing is commonh adopted u hereto (for example) the MAC address of a deuce is 
encoded to a smallei \alue and stored in memon. using that \alue as an address Thereby 
rather than implementing a search of a 48bit number, the hashed \alue is used to retr^e the 
stored MAC address Howe\er. often at least two different MAC addresses can hash to the 
same \alue and a linked list of entries under the same hashed MAC address must be 
constructed, inferring an unpredictable latency to the switch Statistically, the list increases 
rapidh the smaller the amount of memon' available to store the addresses. One could have 
an example where there is a long list of entries at one hashed address despite the fact that 
there ma> be hashed addresses that ha\ e not yet been used 

A binar> search tree is a data structure adopted predominantly in software, that yields an 
efficient algorithm for searching through a large database for a particular entry The 
structure has a few ke> characteristics For com enience. each location in the tree is termed a 
'node' and the information contained therein is called an 'element' 

E\en bman search tiee has a unique root node seperating two other binary search trees 
These two bman trees are disjoint from each other and from the root node and are called the 
left and right subtrees of the root These subtrees are themselves binary search trees m their 
own right Each time one mo\es from a node to one of its subtrees, a 'le\ el' in the tree is 



lra\ersed If a binarx search tree has L lex els. the root node is the only node to be a member 
of lex el L. the uppermost lex el A full binary tree with L levels has (2 A L)-1 nodes 

Inherent in the lex els of the tree is a hierarchical structure For any node at some level X 
theie is a unique parent' node at lex el X+l and txxo 'children' at a lex el X-l 

Seaiching the tree can be optimised bx the folloxxing rules applied at am node \xith element 
B 

a) If A is ANY element m the left subtree of B. then A is less than B 

b) if C is ANY element in the right subtree of B. then C is greater than B. 

For a gixen number of nodes there is an associated minimum tree depth that yields 
maximally efficient searches for any gixen element of that tree A tree whose depth is 
minimal for a gix en number of entries is called a 'balanced' tree However, in compiling a 
tree from a succession of. for example, randomly occunng x alues \xithin a range that is to be 
encompassed bx the tree, the tree is likely to be unbalanced, especially if implemented m 
hard \x are 

If a binarx tree is (as is usual) implemented in softxxare the hierarchy of the tree is formed 
bx linking nodes, xxith pointers, to other nodes Along xxith the actual element, txxo pointers 
are also contained at exerx node, one to the memory location of the left subtree and one to 
the right subtree As usual, pointers" are simply memory addresses In hardxxare. storing 
this kind of information occupies unnecessary area on a chip, often a more critical aspect of 
design than the latency of a lookup process Hoxxexer. latency is also an important 
consideration m hardxxare designs especially in time critical applications 



Summary of the Invention 



The present lmenlion is based on the dynamic construction of pointers to neighbouring 
nodes using simple logic, sauna the area otherwise needed to store these pointers statically 
for e\er\ node m the tree 

A tree implemented in accordance with the rmention is always balanced because e^er> new 
element is inserted at the "highest" a\ailable node in the hierarch\ of the tree Thus for a 
lull tiee ol L le\ els theie is a worst case of L possible comparisons before the search 
element can be located 

A binary search tree according to the lm ention offers a deterministic minimal search latencv 
for any element (such as a MAC address) stored in one of its nodes because (a) e^erv 
memon address points to one other element only: and (b) the search algorithm adopted to 
find an address in the database has a fixed worst case \alue fixed by the size of the lookup 
memon available The invention is therefore particularly suited for implementation m 
hardware with minimal memon. 

Further objects and features of the invention will be apparent from the following detailed 
descnption with reference to the accompanying drawings 

Brief Description of the Drawings 

Figure I is a schematic illustration of a network switch 

Figure 2 illustrates a software staictured balanced bmaiy search tree 

Figure 3 illustrates a software structured unbalanced binary search tree. 

Figure 4 illustrates part of an insertion process 

Figure 5 illustrates the result of a shuffling process 



Figure 6 illustrates memon locations restructured as a binan search tree 
Figure 7 illustrates the general structure of a binary search tree with le^ el decode 
Figure 8 illustrates a searching flow algorithm 

Figure 9 illustrates neighbouring code locations m a binan- seach tree 
Figure 10 illustrates a search path for finding a highest a\ailable new node 
Figures 1 1 and 12 illustrate a flow chart algorithm for inserting new elements 
Figure 13 illustrates a flow chart for a deletion algorithm 
Detailed Description 

Figure 1 illustrates, for the sake of a specific example, one form of device within which a 
bmarx search tree ma} be used in accordance with the invention The example gi\ en is of an 
otherwise well known form of switch which can be used in a packet-based communication 
system, conforming for example to an Ethernet protocol and particularly IEEE Standard 
802 3 (1998 Edition) In Mew of the generally well known nature of the switch, which m 
practice is of considerable complexm. the switch shown in Figure 1 has been deliberate!)- 
simplified 

The switch 1 shown in Figure 1 has a multiplicity- of ports 2 In practice there are mam 
more ports than the four shown These ports are capable of receiving addressed data packets 
or other frames from a communication medium and also transmitting addressed data 
packets The ports are coupled to media access control (MAC) devices 2 which are intended 
to be in well known form Packets recened b\ any of the ports are. after appropriate pre- 
processing passed b\ way of a memory bus s\stem. coupled to a CPU 5. to a memon 
controller 5 controlling reading and writing in a memory 7 which ma}- be located 'on-chip' 
but ma\ be off-chip according to preference Coupled to the bus system is a look-up engine 
8 ha\ing access to a look-up database 9 In practice the look-up engine 8 mav be part of the 



processing s\ stem represented by the CPU 5 but depending on the organisation of the switch 
there may be a multiplicity of look-up engines, a multiplicity of processors and so on A 
typical example of a modern, complex multi-chip switch, wherein there are mutually 
coupled look-up engines and network processors, is described m the earlier application of 
0 Callaghan et al Serial No 09/818.670 filed 28 March 2001 and commonly assigned 
heiewith filed 30 January 2001 Mother form of switch is described m application of 
Creedon et al . filed 29 June 2001. and entitled "ASIC SYSTEM ARCHITECTURE 
INCLUDNG DATA AGGREGATION TECHNIQUE' and commonly assinged herewith. 

One process which the switch commonly has to perform is a look-up based on address data 
within a recened packet in order to determine the destination or group of destinations to 
which a packet or replicas of a packet should be forwarded The look-up database 9 is used 
for this purpose Topically it comprises a multiplicity of entries at specific memory 
locations, the entries including -associated data' which (among other things) includes the 
forwarding data for the rele\ ant packet Typically the forwarding data identifies, for 
example b\ means of a bit mask, the particular ports from which a packet haung the 
particular destination address should be forwarded This look-up process is commonly 
known as a 'destination address look-up' 

A look-up database of this nature is commonh- at least partly established by performing an 
additional look-up. known as a source address look-up. wherein, for example, the look-up 
database is examined to see whether there is an entry corresponding to the source address of 
an incoming packet If there is no such entry a new entry- can be made including an 
identification of the port on which the packet hay mg that source address was recened 

It will be understood by those skilled m the art that thrs a brref descnptron of a process 
which can imohe other operations, haying regard to forwarding rules in the network. 
VLAN membership, spanning tree algorithms, trunkmg rules and so on 

The relationship between a packet and the entry in the look-up database is exemplified as 
follows A packet usually has a preamble. MAC address data, typically including a 48-bit 
destination address and a 48-bit source address, a further section, which may include 
contiol data. VLAN data, network (layer 3) addresses and so on. a message section 



(comprising the data 'pa\load' of the packet) and a cyclic redundancy code section. 
T> picalh . there is a search made on part of the MAC address data to locate a corresponding 
element or entn in the look-up database, this entn prc» iding access to associated data 

Theie is quite a \anety of wa\s m which the search can be organised, both m hardware and 
software The principal problem is that the address data or search key is commonh \er\ 
long <t> picalh 48 bits) and the number of different entries that ma\ ha\ e to be made ma> be 
ven large Considerable effort has been de\oted to the achieung of efficient, large capacity, 
search structures and algorithms 

In these and other circumstances the binan search tree offers a convenient and deterministic 
minimal search latency because e\ery memon address is associated with a single MAC 
address and the search algorithm has a fixed worse case ^aiue fixed by the size of the look- 
up memon available 

Figure 2 schematicalh represents a balanced binan search tree which is structured in 
softwaie Each node (except the final' leaf nodes) has two pointers to the left and right A 
binan search is made b\ comparing the \alue of the key with the element at the root node 
The search terminates if the key is equal to that element, w hich is 50 in the example If the 
ke\ is not equal to the element then one or other of the nodes identified by the pointers is 
accessed next, depending on whether the key is greater or less than the element at the 
examined node 

Each entn. includes or has a pointer to associated data (not shown in Figure 2) as well as a 
left pointer and a write pointer, which in accordance with ordinary practice are merely 
addresses in which the elements of the adjacent nodes are stored 

It is desirable to achie\e a balanced tree m order to minimise the number of operations 
required to achie\e a match between the key and the address data in the entry 

It should be understood, m relation to Figure 2. that the actual location m memory of the 
entries shown is unimportant except for the root node A search is made by comparing the 
address (or ke\ ) with the element in the root node If there is identity, the search ends 



immediateh The search proceeds down the left tree if the key is less than the element m the 
root node and down the right tree if the key is greater than the element in the root node 
Thus for example when searching the tree m Figure 2. if the key is less than 50 the next 
stage of the search is directed by the left pointer to the entry -20' On the contrary, if the 
address ke> is greater than '50' the right pointer will be used to access the node shown with 
element "90" and a further stage of comparison occurs and so on 

Figure 2 is balanced, w tth the same depth of tree both to the left and to the right of the root 
node because it is constructed ex post facto, it being known that '50' is the middle value of 
the elements In piactice. unbalance occurs owing to the fact that the entries ma\ ha-\e to 
be compiled in an uncontrolled order This is shown in Figure 3 The root node is 
established first and contains the element 90 Thus it will be seen that the number of nodes 
and the depth of the tree is greater on the left-hand side of Figure 3 than the right-hand side 
If for example the next entry had an element -24'. the tree would be tra\ersed down to the 
node ha\mg address ke\ "27" and would be established using the left pointer a\ailable for 
thai node, making the tree further unbalanced The unbalance will occur if the elements that 
are established after the root node are preponderantly greater (or less) than the element at the 
root node 

In order to alle-\ late dela\s caused by unbalanced trees, it is known to "shuffle" the elements 
in the nodes This is shown in Figures 4 and 5 In Figure 4. node 40 contains element '50* 
and node 41 contains element "20" Node 42, the other 'child' of node 40. is the highest (in 
terms of levels) a\ailable node If the next element to be stored in '2'. less than the element 
in node 41. it will be put in one of the child nodes of node 41. tending to unbalance the 
three It is known to perform a shuffling operation as shown in Figure 5. wherein node 41 
becomes the root node and node 40 a child of the root node The new element '2" is put into 
node 43 and the tree is balanced 

Figure 6 illustrates an arra\ 60 of standard hardware memory locations, each defined by a 
multiple binary word The array 60 of memory locations can be organised as a binary tree 
61 from a root node 62 The tree 61 has nodes corresponding to the addresses m array 60 
and except for the leaf nodes (i e at the lowest level) each node has two child nodes of 
w hich the addresses can simph be computed from their parent node 



Figuie 7 illustrates a binary search tree which is proxided with an explicit hierarchy 
represented b\ a 'lex el decode' Each address has a binan size of L bits, usualh 
represented com entionalh as [L-l 0] Shown adjacent the tree 71 in Figure 7 is a decoding 
scheme 72 identifying each lexei from 0 to L with the address of the first node at the 
respective lex el Owing to its bmar> nature, a tree with L ley els may be constructed with a 
number of locations corresponding to (2 L )- 1 

It should be noted that m Figure 7 the tree 71 has a predetermined structure such that for 
each ley eh each node has two child notes of which the right-hand node has an address 
greater than the address of the left-hand node and each is simply computable from the 
address of the parent node Thus for example the address [0100 0] of node 72 can generate 
the address [0010 0] of node 73 and the address [0110. ] of node 74 by diminishing and 
augmenting respectix eh the address of node 72 by the binary a alue 2 m where m is one fewer 
than the lex el of the parent node, in accordance with the ordering of a binary tree 

Knowing the lex el for anx node of interest allows the calculation of -pointers" to 
neighbouring nodes Knowing these will in turn allow the implementation of simple 
algorithms which are used to search for an element and for inserting and deleting new 
elements in the tree and which can be performed by hardware logic These pointers are not 
stored on a per node basis, thex are dynamically calculated when needed 

Accordingly if the tree is structured as shoxvn m Figure 7. then the flow algorithm 
employed in Figure 8 max- be used when searching for an element In this and the other floxv 
diagrams a single equals sign represents the action 'set (parameter) equal to (stated value)" 
so that for example in the first stage 80 the variable 'current node' is set to the root node 
and the xanable "current lex el' is set to the number of lex els. The double equals sign 
represents the discoxerx of identity, so that for example in stage 83. a test is made to 
determine whether the current lex el is identical to zero 

According to the flow algorithm in Figure 8. the comparison stage 81 discovers whether the 
current element is identical to the key or search element If the current element is identical to 
the search element then the element has been found (stage 82) and. in the specific example. 



- 9 - 

the associated data is retr^ ed If the current element is not identical to the search element, 
then a test is made (stage 83) whether the current le\el is identical to zero If it is. then the 
search has been completed unsuccessfully and the element is not found in the search table If 
the current le\el is not identical to zero, the next test (85) is whether the current element is 
gi eater oi less than the search element This is the basic stage of a binan search tree The 
search is then directed either to the left child node or the right child node according to 
whether the current element is less than or greater than the search B\ \ irtue of the structure 
just discussed, the "current" node rs set to the appropriate child node and the "current' level 
is decreased b\ unit} The search ^ erts to stage 81 and so on 

It is worth mentioning here that the tree staicture and the level decoding allow the 
computation of nodes neighbouring a random!} selected node and also allow the 
determination whether a node is the first or last at a gn en le\ el 

Consider a randomh selected node. n. at memory address. currentNode[L 0] 

a) currently el Decode Knowing the le\el at which a node resides. sa\ X. one calculates a 
decode of it as follows 

currenlLe\elDecode[L X+l] = 0 
currenlLe\elDecode|X] = l'bl 
currenlLe\eIDecode[X-l 0| = 0 

b) The parent node is the node immediately abo^e a randomh' selected node The only node 
not to have a parent is the root node 

parent [L 0] 

(<-currenfLe\elDecode[L 0])&currentNode[LO])|(pre\iousLe\ elDecode[L.O]) 

c) The RightChildNode is the node which resides to the right of a randomly selected node 
The element of the right child node is greater than the element contained at node n 

rrghtChildNode[L 01 = curreniNode[L 0]|ne\tLe^ elDecodefL 0] 



d) The LeftChildNode is the node which resides to the left of a randomly selected node 
The element of the left child node is less than the element contained at node n 

leftChildNodefL 0] = currentNodefL 0]|nextLe^ elDecodefL 0] 

e ) pre\ iousLe\ elDecode = the le\ el at which a node's parent resides, decoded 
pre\iousLe\elDecode[L 0] = currently elDecode[L 0]«1 

0 ne\tLe\ elDecode = the \e\ el at w hich a node's child/children reside, decoded 
nextLe\elDecode|L <)] = currently elDecodefL 0]»1 

g) nextLocationOnLe^ el The next location on a le\ el is the node directh to the right of n. 
nextLocattonOnLe\el[L ()] = currentLocation[L 0] + preuousLe^ elDecode[L 0] 

h) lastNodeAtLe\eI The last node on the Iq\ el at which n resides 

lastNodeAtLe\el[0] = currently elDecode[0]: 
lastNodeAtLe\ el[ 1 ] = currentLeveldecode[l]|lastNodeAtLe\ el[0]. 
laslNodeAtLe\ el[2] = currently eldecode[2]|lastNodeAtLe\ el[l ]. 
las!NodeAlLe\ el[3] = currentLeveldecode[3]|lasfNodeAtLe\el[2]; 

lastNodeAtLe\eI[L] = currentLeA eldecode[L]|lastNodeAtLevel[L-l], 

i) firstNodeAtLexel The first node on the le\el at which n resides 
firstNodeAtLe\el[L 0] = currentLevelDecode[L 0] 

The relationships between the nodes are summarized in Figure 9, 

Figure 10 illustrates the pattern for the search for new unoccupied nodes when inserting a 
new element The search commences at the root node 101. then proceeds along the next 
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Ie\el (L-l ) for nodes 102 and 103. then proceeds along the next level (L-2) that is to sav 
nodes 104 to 105 and so on to the next le\el of which the first node is 106 and the last node 
of the le\el is 107 At each le\el a test can be made m accordance with the foregoing to 
determine whether the node is the last node on that selected le\ el so that a next level decode 
perfoimed will produce a pointer to the first node (for example 102. 104 106) on the next 
le\el down In the Figure, node 108 ma\ be termed the 'base" node, the first node of the 
lowest !e\ e! and node 109. shown with an address of all ones, is the terminating node. 

The remaining Figures are flow diagrams illustrating the operation of hardw are logic for (a) 
inserting new elements in the binary tree and (b) deleting elements from the tree In a 
specific embodiment the nodes constitute, or form part of. the look-up database 9 m Figure 1 
and the logic engine performing the insertion and deletion processes as well as the search 
process described in Figure 8 forms part of the look-up engine 8 m Figure 1 

Figure 1 1 is a flow chart of an algorithm for the insertion of new elements It commences 
with stage 111. which the -current" node, that is to sa> the node m respect of which 
operations are being pei formed, is set to the root node The other parameter which needs 
setting is the -current le\eF which is set to the number of le\ els in the tree (L) 

Stage 1 12 is a test whether the current element (the element stored at the current node) is 
zero If it be zero, the procedure described in Figure 12 will be followed This is described 
later 



It the current element stored at the current node is non-zero, the algorthm tests whether the 
current node is the last node at the le\ el (determined as previously described) If the current 
node is the last node at the lev el there is a further test 1 14 to determine whether the current 
le\el is zero (the lowest level of which node 108 is the base and node 109 is the terminating 
node) 

If the current level is zero as determined by stage 114. there is no free space, as indicated bv 
stage 1 15 The algorithm has reached node 109 as shown in Figure 10 



If the test 1 14 indicates that the current node is the last node at the le\el and the current le\el 
is non-zero, then the current level must be decremented (stage 1 16) and the current node set 
to the current lex el decoded (stage 117) This \m11 direct the insertion process to the first 
node in the next le\el 

If tests 1 13 indicate the cunent node is not the last node at the lex el. then the current node is 
reset to be the next location on the lex el. stage 118. and the algorithm re\erts to stage 112 

Figure 12 illustrates the insertion process m the exent that the current node is set to the root 
node and the current element is zero The process in Figure 12 includes a shuffling 
algorithm Stage 120 defines "wnteNode" as equal to the current node., in preparation for a 
anting operation Stage 121 is a test for the current node being the root node If it is. then 
the new element is written into the node, stage 134 and the process ends (stage 135) 

If the current node is not the root node then a test (123) must be made to determine whether 
the parent element (the element stored in the parent of the current node) is greater than the 
current element 

If the parent element is greater than the current element, the next test is w hether the current 
node is equal to the base node, stage 124 

If the current node is not equal to the base node (124). the current node is set (stage 125) to 
one fewer than the current node If the current element is non-zero (stage 126). and the 
current element is less than the new element (stage 127). then the write element is the 
current element and the write node is the current node (stage 128). 

If the parent element is greater than the current element and the current node is not equal to 
the terminating node (stage 129) the current node is set (stage 130) to one more than the 
current node If the current element is non-zero (stage 131). and the current element is 
greater than the new element (stage 132) then the new element is written (134) and the 
process ends (stage 135) If the current element is not greater than the new- element then the 
write element is set to the current element, the write node is set to the current node (stage 
133) and this sub-process recycles 



A specific example of the insertion of four elements in an initially unoccupied trie now 
follows It is assumed that the elments are xi to x4 where x t > x? > x ? > x 4 In each case the 
stages shown m Figures L 1 and 12 are listed, with the result (Yes or No) gnen for each test 
in the path For the sake of simplicity it is assumd that the search tree has only three levels, 
e g a root node. le\ el 2. two nodes at level 1 and four nodes at level 0. so that the first node 
at the last-mentioned le\el is the base node This tree corresponds to nodes 101 to 105 m 
Figure 10 (identifiable with 3-bit addresses) 

(a) Element \t stages 1 1 1 - 1 12 (YES) - 120 - 121 (YES) - 134-135 Thus \j is stored 
at the root node 



(b) Element v> stages 111 - 112 (NO) - 113 (YES) - 114 (NO) - 116 - 117 - 112 
(YES) -120 - 121 (NO) - 123 (YES) - 124 (NO) - 125 - 126 (YES) - 124 (YES) - 134 
- 135 

x 2 is stored m the leftChildNode of the root node 



(c) Element \j stages 111-112 (NO) - 113 (YES) - 114 (NO) - 116 - 117 - 112 
(NO) -1 13 (NO) - 1 18 - 1 12 (YES) - 120 - 121 (NO) - 123 (YES) - 124 (NO) - 125 

- 126 (YES) - 124 (NO) - 125 - 126 (NO) - 127 (NO) - 128- 124 (NO) - 125-126 
(YES) - 124 (NO) -125- 126 (NO) - 127 (NO) - 128- 124 (NO) - 125- 126 (YES) 

- 124 (YES) - 134 - 135 

x-, is stored in the nghtChildNode of the root node, therebv maintaining the balance of 
the tree 



(d) Element v, stages 1 1 1 - 1 12 (NO) - 1 13 (YES) - 1 14 (NO) - 1 16 - 1 1 7 - 1 12 
(NO) - 113 (NO) - 118-112 (NO) - 113 (YES) - 1 14 (NO) - 116-112 (YES) - 120 
- 121 (NO)- 123 (YES)- 124 (YES)- 134 - 135. 
x 4 is stored m the baseNode of the tree 



Figure 13 is a flow chart for a deletion algorithm. This is not essential to the invention in its 
broadest form but. particularly in the context of the switch it is desirable to be able to 
remo\e entries selectively, for example as part of an "ageing" process in which new entries 
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(e g MAC addiesses) aie gnen a "time stamp" (e g a number m a rec\ cling series) and at 
appropriate intervals entries which are too old by comparison of the time stamp with the 
state of an ageing clock are remo\ ed. to make additional room for new entries 

The deletion process begins at stage 150 to set current node to 'delete node" and a 
checking-up' flag to zero Stage 15 1 is a test to determine whether the node to be deleted is 
at le\el zero If it is. then without further tests the element is set to zero (152) and deleted 
(153) 

11 the node is not at le\el zero then stage 154 augments the current node If the current 
element is zero (155). then stage 158 tests whether the parent node is equal to the delete 
node If it is not. the current node is set to the parent node and tests 155. 158 and 159 recur 
If the parent node is equal to the delete node then after stage 160. a monitoring stage, and 
there is a check-up. the current element is set to zero (162) and the element is deleted (163) 
If the check-up has not been made, then the check-up is set to the current node is 
decremented b\ unit\ and the process reverts to stage 155 



